Greeley
Empowering GraphRAG with Knowledge Filtering and Integration
Guo, Kai, Shomer, Harry, Zeng, Shenglai, Han, Haoyu, Wang, Yu, Tang, Jiliang
In recent years, large language models (LLMs) have revolutionized the field of natural language processing. However, they often suffer from knowledge gaps and hallucinations. Graph retrieval-augmented generation (GraphRAG) enhances LLM reasoning by integrating structured knowledge from external graphs. However, we identify two key challenges that plague GraphRAG:(1) Retrieving noisy and irrelevant information can degrade performance and (2)Excessive reliance on external knowledge suppresses the model's intrinsic reasoning. To address these issues, we propose GraphRAG-FI (Filtering and Integration), consisting of GraphRAG-Filtering and GraphRAG-Integration. GraphRAG-Filtering employs a two-stage filtering mechanism to refine retrieved information. GraphRAG-Integration employs a logits-based selection strategy to balance external knowledge from GraphRAG with the LLM's intrinsic reasoning,reducing over-reliance on retrievals. Experiments on knowledge graph QA tasks demonstrate that GraphRAG-FI significantly improves reasoning performance across multiple backbone models, establishing a more reliable and effective GraphRAG framework.
- North America > United States > Colorado > Weld County > Greeley (0.14)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- North America > United States > Oregon (0.04)
- North America > United States > Michigan (0.04)
WavePulse: Real-time Content Analytics of Radio Livestreams
Mittal, Govind, Gupta, Sarthak, Wagle, Shruti, Chopra, Chirag, DeMattee, Anthony J, Memon, Nasir, Ahamad, Mustaque, Hegde, Chinmay
Radio remains a pervasive medium for mass information dissemination, with AM/FM stations reaching more Americans than either smartphone-based social networking or live television. Increasingly, radio broadcasts are also streamed online and accessed over the Internet. We present WavePulse, a framework that records, documents, and analyzes radio content in real-time. While our framework is generally applicable, we showcase the efficacy of WavePulse in a collaborative project with a team of political scientists focusing on the 2024 Presidential Elections. We use WavePulse to monitor livestreams of 396 news radio stations over a period of three months, processing close to 500,000 hours of audio streams. These streams were converted into time-stamped, diarized transcripts and analyzed to track answer key political science questions at both the national and state levels. Our analysis revealed how local issues interacted with national trends, providing insights into information flow. Our results demonstrate WavePulse's efficacy in capturing and analyzing content from radio livestreams sourced from the Web. Code and dataset can be accessed at \url{https://wave-pulse.io}.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America > United States > New York > Kings County > New York City (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (215 more...)
- Media > Radio (1.00)
- Leisure & Entertainment (1.00)
- Government > Voting & Elections (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
Ancient but Digitized: Developing Handwritten Optical Character Recognition for East Syriac Script Through Creating KHAMIS Dataset
Majeed, Ameer, Hassani, Hossein
Many languages have vast amounts of handwritten texts, such as ancient scripts about folktale stories and historical narratives or contemporary documents and letters. Digitization of those texts has various applications, such as daily tasks, cultural studies, and historical research. Syriac is an ancient, endangered, and low-resourced language that has not received the attention it requires and deserves. This paper reports on a research project aimed at developing a optical character recognition (OCR) model based on the handwritten Syriac texts as a starting point to build more digital services for this endangered language. A dataset was created, KHAMIS (inspired by the East Syriac poet, Khamis bar Qardahe), which consists of handwritten sentences in the East Syriac script. We used it to fine-tune the Tesseract-OCR engine's pretrained Syriac model on handwritten data. The data was collected from volunteers capable of reading and writing in the language to create KHAMIS. KHAMIS currently consists of 624 handwritten Syriac sentences collected from 31 university students and one professor, and it will be partially available online and the whole dataset available in the near future for development and research purposes. As a result, the handwritten OCR model was able to achieve a character error rate of 1.097-1.610% and 8.963-10.490% on both training and evaluation sets, respectively, and both a character error rate of 18.89-19.71% and a word error rate of 62.83-65.42% when evaluated on the test set, which is twice as better than the default Syriac model of Tesseract.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Asia > Middle East > Iraq > Kurdistan Region > Duhok Governorate > Duhok (0.05)
- Asia > Middle East > Iraq > Erbil Governorate > Erbil (0.05)
- (11 more...)
- Government (0.93)
- Education > Educational Setting > Higher Education (0.39)
- Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.85)
GitHub - tesseract-ocr/tesseract: Tesseract Open Source OCR Engine (main repository)
This package contains an OCR engine - libtesseract and a command line program - tesseract. Tesseract 4 adds a new neural net (LSTM) based OCR engine which is focused on line recognition, but also still supports the legacy Tesseract OCR engine of Tesseract 3 which works by recognizing character patterns. Compatibility with Tesseract 3 is enabled by using the Legacy OCR Engine mode (--oem 0). It also needs traineddata files which support the legacy engine, for example those from the tessdata repository. The lead developer is Ray Smith.
- North America > United States > Colorado > Weld County > Greeley (0.06)
- Europe > United Kingdom > England > Bristol (0.06)
Measure-Theoretic Probability of Complex Co-occurrence and E-Integral
Complex high-dimensional co-occurrence data are increasingly popular from a complex system of interacting physical, biological and social processes in discretely indexed modifiable areal units or continuously indexed locations of a study region for landscape-based mechanism. Modeling, predicting and interpreting complex co-occurrences are very general and fundamental problems of statistical and machine learning in a broad variety of real-world modern applications. Probability and conditional probability of co-occurrence are introduced by being defined in a general setting with set functions to develop a rigorous measure-theoretic foundation for the inherent challenge of data sparseness. The data sparseness is a main challenge inherent to probabilistic modeling and reasoning of co-occurrence in statistical inference. The behavior of a class of natural integrals called E-integrals is investigated based on the defined conditional probability of co-occurrence. The results on the properties of E-integral are presented. The paper offers a novel measure-theoretic framework where E-integral as a basic measure-theoretic concept can be the starting point for the expectation functional approach preferred by Whittle (1992) and Pollard (2001) to the development of probability theory for the inherent challenge of co-occurrences emerging in modern high-dimensional co-occurrence data problems and opens the doors to more sophisticated and interesting research in complex high-dimensional co-occurrence data science.
- North America > United States > Colorado > Weld County > Greeley (0.13)
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (6 more...)
Semantic Structure based Query Graph Prediction for Question Answering over Knowledge Graph
Building query graphs from natural language questions is an important step in complex question answering over knowledge graph (Complex KGQA). In general, a question can be correctly answered if its query graph is built correctly and the right answer is then retrieved by issuing the query graph against the KG. Therefore, this paper focuses on query graph generation from natural language questions. Existing approaches for query graph generation ignore the semantic structure of a question, resulting in a large number of noisy query graph candidates that undermine prediction accuracies. In this paper, we define six semantic structures from common questions in KGQA and develop a novel Structure-BERT to predict the semantic structure of a question. By doing so, we can first filter out noisy candidate query graphs, and then rank the remaining candidates with a BERT-based ranking model. Extensive experiments on two popular benchmarks MetaQA and WebQuestionsSP (WSP) demonstrate the effectiveness of our method as compared to state-of-the-arts.
- Europe > Portugal > Lisbon > Lisbon (0.04)
- North America > United States > Colorado > Weld County > Greeley (0.04)
- Media > Film (0.69)
- Leisure & Entertainment (0.69)
How can AI help companies looking for vaccines? - Marketplace
The new coronavirus is now officially a pandemic, and researchers are speeding to discover, test and deploy a vaccine. Some are hoping that breakthrough biotechnology and artificial intelligence can get us there faster. Much of the funding and development of a COVID-19 vaccine is likely to happen privately. The $8 billion coronavirus funding bill passed by the U.S. government includes just $800 million for the National Institutes of Health, where official U.S. vaccine research and development happens, but which is chronically underfunded. I asked Michael Greeley, co-founder and general partner with biotech investment fund Flare Capital in Boston, what uses AI could have in dealing with coronavirus.
- North America > United States > Colorado > Weld County > Greeley (0.25)
- Europe > Netherlands > North Holland > Amsterdam (0.05)
- Europe > Italy (0.05)
- Asia > China (0.05)
Popular Machine Learning Projects on Github You must know!
Github has become the goto source for all things open-source and contains tons of resource for Machine Learning practitioners. We bring to you a list of 10 Github repositories with most stars. TensorFlow is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) that flow between them. This flexible architecture lets you deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device without rewriting code.
Threat of Mass Shootings Leads to AI-Powered Cameras in US Schools
Paul Hildreth looked at images from security cameras set up at schools in Fulton County, Georgia. He began watching a video of a woman walking inside one of the school buildings. The top of her clothing was bright yellow. Hildreth used his computer's artificial intelligence, or AI system to find other images of the woman. The system put the pictures together in a video that showed where she currently was, where she had been and what she was doing.
- North America > United States > Georgia > Fulton County (0.25)
- North America > United States > New York (0.05)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.05)
- (4 more...)
AI-powered cameras become new tool against mass shootings
In this July 30, 2019, photo, Paul Hildreth, emergency operations coordinator for the Fulton County School District, works in the emergency operations center at the Fulton County School District Administration Center in Atlanta. Artificial Intelligence is transforming surveillance cameras from passive sentries into active observers that can immediately spot a gunman, alert retailers when someone is shoplifting and help police quickly find suspects. Schools, such as the Fulton County School District, are among the most enthusiastic adopters of the technology. Paul Hildreth peered at a display of dozens of images from security cameras surveying his Atlanta school district and settled on one showing a woman in a bright yellow shirt walking a hallway. A mouse click instructed the artificial intelligence-equipped system to find other images of the woman, and it immediately stitched them into a video narrative of where she was currently, where she had been and where she was going. There was no threat, but Hildreth's demonstration showed what's possible with AI-powered cameras.
- North America > United States > New York (0.05)
- Oceania > New Zealand (0.05)
- North America > United States > Tennessee (0.05)
- (9 more...)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Law (1.00)
- Education (1.00)